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For many systems characterized as “complex” the patterns exhibited on different scales differ 
markedly from one another. For example the biomass distribution in a human body “looks very 
different” depending on the scale at which one examines it. Conversely, the patterns at different 
scales in “simple” systems (e.g., gases, mountains, crystals) vary little from one scale to another. 
Accordingly, the degrees of self- dissimilarity between the patterns of a system at various scales 
constitute a complexity “signature” of that system. Here we present a novel quantification of self- 
dissimilarity. This signature can, if desired, incorporate a novel information-theoretic measure of 
the distance between probability distributions that we derive here. Whatever distance measure is 
chosen, our quantification of self-dissimilarity can be measured for many kinds of real-world data. 
This allows comparisons of the complexity signatures of wholly different kinds of systems (e.g., 
systems involving information density in a digital computer vs. species densities in a rain-forest 
vs. capital density in an economy, etc.). Moreover, in contrast to many other suggested complexity- 
measures, evaluating the self-dissimilarity of a system does not require one to already have a model 
of the system. These facts may allow self-dissimilarity signatures to be used as the underlying 
observational variables of an eventual overarching theory relating all complex systems. To illustrate 
self-dissimilarity we present several numerical experiments. In particular, we show that underlying 
structure of the logistic map is picked out by the self-dissimilarity signature of time series’ produced 
by that map 


I. INTRODUCTION 

The search for a measure quantifying the intuitive no- 
tion of the “complexity” of systems has a long history 
[1, 6]. One striking aspect of this search is that for al- 
most all systems commonly characterized as complex, the 
spatio-temporal patterns exhibited on different scales dif- 
fer markedly from one another. Conversely, for systems 
commonly characterized as simple the patterns are quite 
similar. 

The Earth climate system is an excellent illustration, 
having very different dynamic processes operating at 
all spatiotemporal scales, and typically being viewed as 
quite complex. Complex human artifacts also share this 
property, as anyone familiar with large-scale engineering 
projects will attest. Conversely, the patterns at differ- 
ent scales in “simple” systems like gases and crystals do 
not vary significantly from one another. It is the self- 
similar aspects of simple systems, as revealed by allo- 
metric scaling, scaling analysis of networks, etc. [7], that 
reflects their inherently simple nature. Due to this self- 
similarity, the pattern across all scales can be encoded in 
a short description for simple systems, unlike the pattern 
for complex systems. 

Accordingly, it is the self -dissimilarity (SD) between 
the patterns at various scales that constitutes the com- 
plexity “signature” of a system [11]. Intuitively, such a 
signature tells us how the information and its process- 
ing [2] at one scale in a system is related to that at the 
other scales. Highly different information processing at 
different .scales means the system is efficient at encoding 
as much processing into its dynamics as possible. In con- 
trast, having little difference between the various scales, 
i.e., high redundancy, is often associated with robustness. 


The simplest version of such a signature is to reduce 
all of the patterns to a single number measuring their 
aggregate dissimilarity. This would be analogous to con- 
ventional measures which quantify a system’s “complex- 
ity” as a single number[12]. We can use richer signatures 
however. One is the symmetric matrix of the dissimilar- 
ity values between all pairs of patterns at different scales. 
More generally, say we have a dissimilarity measure that 
can be used to quantify how “spread out” a set of more 
than two patterns is. Then we can measure the spread 
of triples of scale-indexed patterns, quadruples, etc. In 
such a situation the signature could be a tensor, (e.g., a 
real number for each possible triple of patterns), not just 
a matrix. 

SD signatures may exploit model-based understanding 
about the system generating a data set of spatio-temporal 
patterns (for example, to statistically extend that data 
set). However they are functions of such a data set rather 
than of any model of the underlying system. So in con- 
trast to some other suggested complexity measures, with 
SD one does not need to understand a system and then 
express that understanding in a formal model in order 
to measure its complexity. This is important if one’s 
complexity measure is to serve as a fundamental obser- 
vational variable used to gain understanding of particular 
complex systems, rather than as a post-hoc characterizer 
of such understanding. 

Indeed, one application of SD is to (in)validate mod- 
els of the system that generated a dataset, by compar- 
ing the SD signature of that dataset to the signature of 
data generated by simulations based on those models. 
Model-independence also means that the SD complexity 
measure can be applied to a broad range of (data sets 
associated with) systems found in nature, thereby poten- 


tially allowing us to compare the processes underlying 
those types of systems. Such comparisons need not in- 
volve formal models. For example, SD signature provides 
us with machine learning features synopsizing a dataset 
[3]. These features can be clustered, thereby revealing re- 
lationships between the underlying systems. We can do 
this even when the underlying systems live in wholly dif- 
ferent kinds of spaces, thereby generating a taxonomy of 
“kinds of systems” that share the same complexity char- 
acter. SD signatures can also serve as supervised learning 
predictor variables for extrapolating a dataset (e.g., into 
the future). In all this, SD signatures are “complexity- 
based” analogues of traditional measures used for these 
purposes, e.g., power spectra. 

The first formalization of SD appeared in [11]. This pa- 
per begins by motivating a new formalization. We then 
present several examples of that formalization. Next we 
present a discussion of information theoretic measures 
of dissimilarity between probability distributions, an im- 
portant issue of SD analysis. We end by illustrating SD 
analysis with several computer experiments [13]. 


II. FORMALIZATION OF 
SELF-DISSIMILARITY 


There are two fundamental steps to constructing the 
SD signature of a dataset. 

The first step is to quantify the scale-dependent pat- 
terns in the dataset. We want to do this in a way that 
treats ail scales equally (rather than taking the pattern 
at one scale to be what’s “left over” after fitting the pat- 
tern at another scale to a data set, for example). We 
also want to minimize the a priori structure and associ- 
ated statistical artifacts introduced in the quantification 
of the patterns. Accordingly, we wish to avoid the use 
of arbitrary bases, and work with entire probability dis- 
tributions rather than low-dimensional synopses of such 
distributions. 

The second fundamental step in forming a SD signa- 
ture is numerically comparing the scale-dependent pat- 
terns, which for us means comparing probability distri- 
butions. We illustrate these steps in turn. 


to ignore. Formally, we quantify such an invariancff 
with a function g that maps any q q 6 Qo to the set 
of all elements of Qo related by our invariance to 
that go- Working with the entire set g(q*) rather 
than a lower-dimensional synopsis of that set avoids 
introducing statistical artifacts and the issue of how 
to choose the synopsizing function. 

3. In the next step we apply a series of scale- indexed 
transformations to the elements in g{q*) (e.g., mag- 
nifications to different powers). The choice of trans- 
formations will depend on the precise domain at 
hand. Intuitively, the scale-indexed sets produced 
by these transformations are the “patterns” at the 
various scales. They reflect what one is likely to see 
if the original g* were “examined at that scale” , and 
if no attention were paid to the transformations we 
wish to ignore. 

We write this set of transformations as the 9- 
indexed set Wg : Qo >— » ► Q 1 (6 is the generalized 
notion of “scale”). So formally, the second step of 
our procedure is the application of Wg to the ele- 
ments in the set g(q") for many different 6 values. 
After this step we have a 0-indexed collection of 
subsets of Qi. 

Note that we again work with full distributions 
rather than synopses of them. This allows us to 
avoid spatial averaging or similar operations in the 
Wg , and thereby avoid limiting the types of Qo on 
which SD may be applied, and to avoid introducing 
statistical biases. 

4. At this point we may elect to use machine learn- 
ing and available prior knowledge [3] to transform 
the pattern of each scale — a set — into a single 
probability distribution, p e . This last step, which 
we use in our experiments reported below, can of- 
ten help us in the subsequent quantification ofthe 
dissimilarities between the scales’ patterns. More 
generally, if one wishes to introduce model-based 
structure into the analysis, it can be done through 
this kind of transformation. [14] 


A. Generation of scale-indexed distributions 

1. Let q* be the element in a space Qo whose self- 
dissimilarity interests us. Usually q* will be a data 
set, although the following holds more generally. 

2. Typically there is a set of transformations of q* 
that we wish our SD measure to ignore. For exam- 
ple, we might want the measure to give the same 
value when applied both to an image and to a slight 
translation of that image. We start by applying 
those transformations to q* , thereby generating a 
set of elements of Qo “cleansed” of what we wish 


B. Quantifying dissimilarity among multiple 
probability distributions: 

Applying the preceding analysis to a q* will give us a 
collection of sets, {Wg[g(q*)]}, one such set for each value 
of 6. All elements in all those sets live in the same space, 
Q 1 . It is this collection as a whole that characterizes the 
system’s self-dissimilarity. 

Note that different domains will have different spaces 
Qx- So to be able to use SD analysis to relate many differ- 
ent domains, we need to distill each domain’s collection 
{Wfy[g(g*)]}, consisting of many subsets of the associated 
Qi, into values in some common space. In fact, often 


Where is too much information in a collection of val- 
ues for it to be a useful way of analyzing a system; even 
when just analyzing a system by itself, without compar- 
ing it to other systems, often we will want to distill its 
collection down to a set of real numbers. 

Since what we are interested in is the dissimilarity of 
the subsets in any such collection, the natural choice for 
such a common space is one or more real numbers mea- 
suring how “spread out” the subsets in any particular 
collection are. More precisely, at a minimum we want 
to use this measure both to quantify the aggregate dis- 
similarity of the entire collection, and to quantify the 
dissimilarity between any pair of subsets from the collec- 
tion. Most generally, we would like to be able to use the 
measure to quantify the dissimilarity relating any n-tuple 
of subsets from the collection. 

Ideally then, such a measure p should: 

1. Obey the usual properties of a metric when it takes 
two arguments, and more generally obey the re- 
quirements for when there are more than two argu- 
ments (and even when those arguments are them- 
selves sets of multiple points) [10]; 

2. Be finite even for the delta- function distributions 
commonly formed from small data sets; 

3. Be quickly calculable even for large spaces; 

4. Have a natural interpretation in terms of the to- 
tal amount of information stored in its (probability 
distribution) arguments. 

Until recently, perhaps the measure best satisfying 
these desiderata was the Jensen-Shannon (JS) distance 
[2], i.e., the entropy of the average of the distributions 
minus the average of their entropies. However this mea- 
sure fails to satisfy 1. In Section IV we present an alter- 
native, which like JS distance obeys 3 and 4, and may be 
better suited to SD analysis. Recent work has uncovered 
many multi-argument versions of distance, called multi- 
metrics [10]. These obey 1 through 2 by construction, 
and many of them obey 3 as well. These are what we ac- 
tually use in our experiments. However the multimetrics 
uncovered to date do not obey 4. 

III. EXAMPLES 

To ground the discussion we now present some exam- 
ples of the foregoing: 

Example 1: Qo is the space of real- valued functions over 
a Euclidean space X , e.g., a space of images over x € X. 

If we wish our measure to ignore a set of translations over 
X then g(qo) is that set of translations of image go- Thus 
if 1* = f(x) then g(q*) is the set {f(x~x i), f{x-x 2 ), • • • } 
where x* are translation vectors. Each IT'# may be mag- 
nification by 6 followed by windowing about the origin 
so that only the local structure of the image around 


Xi is considered. If T is an operator which truncates 
an image f(x) to a window around the origin then 
We(g(q 0 )) = {T[/(^)],T[/(^)],---}. So each 
<?-, ,l = T[f( x ~ Xl )], is a real-valued function over a sub- 
space of X. 

We can then have p be any measure that can compare 
two sets of real- valued functions over X. In particular, 
we can discretize X into n bins to convert each such 
function into an element of R n . In this way each scale’s 
set of functions gets converted into a set of Euclidean 
vectors. 

While multimetrics generalize to distances between ob- 
jects which are not probability densities, to apply the JS 
or Kullback-Leibler (KL) distance [2] to our scale-indexed 
sets of vectors we need to convert them to probabilities. 
If the range of the functions over X making up Qo were 
finite rather than all of R, our “vectors” would be fixed- 
length strings over a finite alphabet (see Ex. 2). In this 
case we could convert each set of “vectors” to a proba- 
bility simply by setting that probability to be uniform 
over the elements of the set and zero off it. For real- 
valued vectors this is typically not possible, and we must 
run a density-estimation algorithm to convert each set of 
vectors in R n into a probability density across R n . 

However they are produced, we need a way to convert 
our resultant sets into a SD signature. The simplest ap- 
proach is to form the symmetric matrix of all pairwise 
comparisons whose i, j element is the multimetric (or JS 
distance or what have you) between the probability of 
and that of 9j. 

Aii of this can be naturally extended to “images” that 
are not real-valued functions, but instead take on values 
in some other space (e.g., of symbols, or of matrices). 
For example, an element of Qo could be the positions of 
particles of various types in R 3 . 

Note that q * may itself be generated from an obser- 
vational windowing process. This may be accounted for 
in a likelihood model P(D\qo) which smooths intensities 
and admits Gaussian noise. 

Example 2: This example is a variant of Ex. 1, but 
is meant to convey the generality of what “scale” might 
mean. We have the same Qo and g as in Ex. 1. However 
say we are not interested in comparing a 5* to a scaled 
version of itself. Instead, each 0 represents a. set of n vec- 
tors {vi(6) 6 X}. Then have TT#(g 0 ) be the m-vector 
“stencil” (q 0 (yi (6)), qo(v 2 (d)), • • • qo(v m (9)))- Then we 
could have p be any distance measure over sets of vectors 
in Q 1 = R m , as discussed in Ex. 1. (The difference with 
Ex. 1 is that here we arrived at those vectors without 
any binning.) 

As an example, we could have stencils consist of two 
points, with tq = 0 for all 6 , and then have v 2 = ka, 
where A: is a scalar, and the vector a is the same for all 
6. In this example Wg isolates a pair of points separated 
by a multiple k of the vector a; changing 9 changes that 
multiple. So our self-dissimilarity measure quantifies how 
the patterns of pairs of points in / separated by ka change 


as one varies k. Another possibility is to have V\ — Rkia), 
where Rk() is rotation by k. In this case our measure 
quantifies how the patterns of pairs of points changes as 
one rotates the space. 

Another important modification is to allow n > 2, so 
that we aren’t just looking at pairs of points. In par- 
ticular, say X is ^-dimensional, and have iq = kai Vi, 
where each a* is a vector in X , % equaling 0 and k be- 
ing the scale, as usual. Then we might want to have the 
distances between any pair of points in a scale’s stencil, 
\ka.i — ka,j\, be a constant times k, independent of i and 
j. This would ensure there is no “cross-talk” between 
scales; all distances in a scale’s stencil are identical. To 
obey this desideratum requires that the underlying sten- 
cil {a^} be a tetrahedron, of at most N + 1 points. 

Example 3: This example is the same as Ex. 2, except 
that X is an M-dimensional infinite lattice rather than a 
Euclidean space, and the W$ are modified appropriately. 
For instance, we could have M = 1 and have symbolic- 
valued functions /, so that an element of qo is a symbolic 
time series. Take n = 2, with iq = 0, and V 2 — k, k now' 
being an integer. Since the range of / is now a finite set 
of symbols rather than the reals, we do not need to do 
any binning or even density estimation; each We (<?(<?*)) is 
a histogram, i.e., it is already a probability distribution. 

Since distributions now are simply vectors in a Eu- 
cliean space, we can measure their dissimilarity with 
something as unsophisticated as Lo distance. Alterna- 
tively, as before, we can compare scales by using JS 
distance for p. In this case our SD measure is an 
information-theoretic quantification of how time-lagged 
samples of the time-series qo differ from each other as 
one changes the lag size. 

Having n > 2 allows even more nuanced versions of 
this quantification. Furthermore, other choices of p (de- 
scribed below) allow it take more than two sets at once 
as arguments. In this case, p takes an entire set of time- 
lagged samples, running over many time lags, and mea- 
sures how “spread out” the members that full set is. 

These measures complement more conventional 
information-theoretic approaches to measuring how the 
time-lagged character of qo varies with lag size. A typical 
such approach would evaluate the mutual information be- 
tween the symbol at a random point in qo and the symbol 
k away, and see how that changes with k. Such an ap- 
proach compares singletons: it sees how the distribution 
of symbols at a single point are related to the distribu- 
tion of symbols at the single time-lagged version of that 
point. These new measures instead allow us to compare 
distributions of n-tuples to one another. 

Example 4: This is a dramatically different example 
to show that self dissimilarity can be measured for quite 
different kinds of objects. Let Qo be a space of networks, 
i.e., undirected graphs with labeled nodes. Have be 
the set of relabelings of the nodes of network qo- Such 
relabelings are what we want the SD analysis to ignore. 


Have each W e run a decimation algorithm on qo, with (y 
parameterizing the precise algorithm used. Each such al- 
gorithm iteratively grows outward from some fixed start- 
ing (^-independent) node a, tagging some nodes which 
it passes over, and removing other nodes it passes over. 
Changing 6 changes parameters of the algorithm, e.g., 
changes which iterations are the ones at which nodes are 
removed. Intuitively, each algorithm Wq demagnifies the 
network by decimation, and then windows it. Different 
We demagnify by different amounts. 

More precisely, at the start of each iteration t, there is 
a subset of all the nodes that are labeled the “current” 
nodes for t. Another subset of nodes, perhaps overlap- 
ping those current at t, constitutes the “tagged” nodes. 
During the iteration, for each current node i, a set of 
non-tagged nodes S t (i) is chosen based on i. For exam- 
ple, this could be done by looking at all non-tagged nodes 
within a certain number of links of i. Then a subset of 
the nodes in S t (i) is removed, with compensating links 
added as needed. The remaining nodes are added to the 
set of tagged nodes, and a subset of them are added to a 
set of nodes that will be current for iteration t+ 1. Then 
the process repeats. 

At the earliest iteration at which the number of tagged 
nodes is at least N, the iterations stop, and all remaining 
nodes in qo are removed. Some fixed rule is then used for 
removing any excess nodes to ensure that the final net 
has exactly N nodes. (Typically iV is far smaller than 
the number of nodes in qo-) p can then be any algorithm 
for measuring distance between sets of identically-sized 
networks. 


IV. DISSIMILARITY OF PROBABILITY 
DISTRIBUTIONS 

In the experiments presented below, we use one of the 
multimetrics discussed in [10]. However other measures 
could be used, and in particular it is worth briefly dis- 
cussing measures derived from information-theoretic ar- 
guments concerning the distance between probability dis- 
tributions. 

The most commonly used way to define a distance be- 
tween two distributions is their KL distance. This is 
the infinite limit log-likelihood of generating data from 
one distribution but mis-attributing it to the other dis- 
tributions. Unfortunately, the KL distance between two 
distributions is infinite if either distribution has points 
at which it is identically zero; violates the triangle in- 
equality; is not even a symmetric argument of its two 
arguments. (It is non-negative though, equaling zero iff 
its two arguments are identical.) 

Some proposals have been made for overcoming some 
of these shortcomings. In particular, the JS distance be- 
tween two distributions does not blow up and is sym- 
metric. However it violates the triangle inequality [4, 9], 
A more important problem for us is that it is not clear 
that JS distance is the proper information-theoretic mea- 


%nire for SD analysis. To illustrate this it helps to consider 
an alternative information-theoretic measure for distance 
between probability distributions, by modifying the type 
of reasoning originally employed by Shannon. 

Say we have a set of K distributions {tt 1 }. (For us that 
set is generated by application of g and the members of 
{Wg} t as discussed above.) Intuitively, our alternative to 
JS distance quantifies how much information there is in 
the knowledge of whether a particular x was generated 
from one member of {tt 1 } or another. To do this we sub- 
tract two terms, each being an average over all possible 
/C-tuples of x values, (xi, X 2 , • • • , x/c). 

The summand of the first average is the Shannon in- 
formation in (xi , X 2 , • • • , xk) when that Ff-tuple is pro- 
duced by simultaneously sampling each of the K distri- 
butions, so that each x t is a sample of the associated 
7 r l . The summand of the second average is the informa- 
tion in (xi,X 2 . • • • ,xk) according to the “background” 
version of the joint distribution, in which all information 
about which distribution generated which x is averaged 
out. Intuitively, the difference in these averages tells us 
how much information there is in the labels of which dis- 
tribution generates -which x: 


p(M) = - II 

Xi,X2,’ 
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(i) 


where the ]Pp notation means a sum over all permuta- 
tions of the {xj} that rearranges them as the P{xj}, and 
the sum is over all such permutations. 

Being a KL distance, this p equals 0 when all the dis- 
tributions are equal, and is never negative. It is not yet 
known though if it is a full-blown multimetric. 


V. EXPERIMENTS 


We illustrate the SD framework with two simple sets 
of computational experiments. The datasets (i.e., the 
go ’s) in all the experiments are functions over either one- 
dimensional or t-wo-dimensional finite lattices. The SD 
analyses we employed were special cases of Ex. 3, using 
a square observation “window” of width w to specify the 
W e . 

In our first experiments our datasets were binary- 
valued (i.e., each qo was a map from a lattice into B). 
Accordingly, the task of estimating each scale’s probabil- 
ity density, p 8 , simplifies to estimating the probability of 
sequences of w bits. For small w this can be done using 
frequency counts (cf. Ex. 3.). We then used a modified 
bounding box multimetric[10]: 

P(P 61 , p e > ,•■■) = — I + £ max (pf 1 , p* , • • ■ ) (2) 

i 

where p 8 is the z’th component of the w-dimensional Eu- 
clidean vector p 8 . Note that being a multimetric, this 



FIG. 1: Self-dissimilarity signatures of binary datasets. Blue 
indicates low dissimilarity (high similarity), and red indicates 
high dissimilarity (low similarity): (a) the repeating sequence 
1111100000 , (b) the repeating sequence 1111111000, (c) a 
quasi-perioaic sequence, (d) the cantor set. For each of these 
datasets the aggregate dissimilarity of the associated scale- 
indexed set of distributions are 15.5. 13.9, 50.3, and 2.4 re- 
spectively. All signatures were obtained using a window of 
length 9. The signatures (f) and (h) are from the satellite 
images (e) and (g) over Baja California and Greenland re- 
spectively. A 3x3 window was used for these two-dimensional 
images. 


measure can be used to give both the aggregate self- 
dissimilarity of all distributions {p 6 } as well as the dis- 
tance between any two of the distributions. 

The pairwise (matrix) SD signatures of six datasets 
are presented in 1. The integrals were all evaluated by 
Monte Carlo importance sampling. The periodicity of the 
underlying data in l(a),(b) is reflected in the repeating 
nature of the SD signature. The quasiperiodic dataset, 
1(c) shows hints of periodicity in its signature, and signif- 
icantly greater overall structure. The fractal-like object 
1(d) shows little overall structure (beyond that arising 
from finite-data-size artifacts). 1(e), (g) show results for 
satellite images which have been thresholded to binary 
values. 

Clustering of these 6 datasets is done by finding the 
partitions of (a), (b), (c), (d), (e), (g) which minimize 
the total intra-group multimetric distance. For 2 clus- 
ters the optimal grouping is [(a)(b)(c)(e)(g)] and [(d)]; 
for 3 clusters the best grouping is [(a)(b)(c)], [(d)], and 
[(e) (g)]; for 4 clusters the best grouping is [(a) (b) (c)] , 
[(d)], [(e)], and [(g)]; and for 5 clusters the best grouping 
is [(a)], [(b)(c)], [(d)], [(e)], and [(g)]. 

We also provide results for the time series generated 
by the logistic map x t +i = rx t ( 1 — x t ), where as usual r 
is a parameter varying from 0 to 4 and 0 < x t < 1 [15]. 

We iterated the map 2000 times before collecting data 
to ensure data is taken from the attractor. For each 
r-dependent time series on the attractor we generate a 
self-dissimilarity signature by taking g to be possible ini- 
tial conditions xo, and We to be a decimation and win- 
dowing, as in Ex. 3. W& acts on a real-valued vector 
x = [xi, X 2 , • • • ] to return a vector of length 3 whose com- 
ponents are xi , xi + e, £ 1+20 where the allowed values for 8 





FIG. 2: Aggregrate SD complexity measure as a function of r 
(red line) for the time series generated from the logistic map 
x t +i = rx t ( 1 - x t ). The dashed black line corresponds to a 
noisy version of the data where zero mean Gaussian noise has 
been added. 

are the positive integers, g and Wg produce points in R 1 2 3 4 5 6 7 8 9 10 . 
Note that in these experiments each p s is a probability 
density function over R 3 . We estimated each such p e by 
centering a zero mean spherical Gaussian on every vector 


in the associated Wg[g{qQ )], with an overall covaxiance de-^ 
termined by cross validation. We again used a modified 
bounding box multimetric [10] of Eq. (2) modified for 
continuous probability densities. The resulting integral 
was evaluated by Monte Carlo importance sampling. 

The aggregate complexity results are presented as the 
solid red line of 2. The results confirm what we would like 
to see in a complexity measure. The measure peaks at the 
accumulation point and is low for small r (where there is a 
fixed point) and large r (where the time series is random). 
Additional structure is seen for r > 3.57, paralleling the 
complexity seen in the bifurcation diagram of the logistic 
map. 

To investigate the effects of noise on the SD measure 
we contaminated all time series the zero mean Gaussian 
noise having standard deviation of 0.001, and applied 
the same algorithm. The resulting aggregate complexity 
measure is plotted as the black dashed line of 2. The 
major features of the aggregate SD measure are preserved 
but with some blurring of fine detail. 
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